Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate SparseArray.values #26421

Merged

Conversation

jorisvandenbossche
Copy link
Member

Having a .values attribute on SparseArray is confusing, as .values is typically used on Series/DataFrame/Index and not on the array classes.

@jorisvandenbossche jorisvandenbossche added the Deprecate Functionality to remove in pandas label May 16, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone May 16, 2019
@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #26421 into master will decrease coverage by <.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26421      +/-   ##
==========================================
- Coverage   91.69%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50741    50743       +2     
==========================================
- Hits        46529    46526       -3     
- Misses       4212     4217       +5
Flag Coverage Δ
#multiple 90.19% <66.66%> (ø) ⬆️
#single 41.16% <0%> (-0.18%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 95.63% <100%> (ø) ⬆️
pandas/core/ops.py 94.68% <100%> (ø) ⬆️
pandas/util/testing.py 90.6% <100%> (-0.11%) ⬇️
pandas/core/arrays/sparse.py 92.71% <40%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 421ae9d...1865863. Read the comment docs.

@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #26421 into master will increase coverage by <.01%.
The diff coverage is 70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26421      +/-   ##
==========================================
+ Coverage   91.74%   91.75%   +<.01%     
==========================================
  Files         174      174              
  Lines       50763    50754       -9     
==========================================
- Hits        46575    46567       -8     
+ Misses       4188     4187       -1
Flag Coverage Δ
#multiple 90.26% <70%> (ø) ⬆️
#single 41.71% <10%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/managers.py 93.93% <ø> (ø) ⬆️
pandas/core/sparse/frame.py 95.63% <100%> (ø) ⬆️
pandas/util/testing.py 90.7% <100%> (+0.1%) ⬆️
pandas/core/ops.py 94.68% <100%> (ø) ⬆️
pandas/core/arrays/sparse.py 93.08% <50%> (+0.38%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44d5498...fb3aebe. Read the comment docs.

@@ -2272,10 +2272,10 @@ def _cast_sparse_series_op(left, right, opname):
# TODO: This should be moved to the array?
if is_integer_dtype(left) and is_integer_dtype(right):
# series coerces to float64 if result should have NaN/inf
if opname in ('floordiv', 'mod') and (right.values == 0).any():
if opname in ('floordiv', 'mod') and (right.to_dense() == 0).any():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not be using np.asarry? generally rather than .to_dense()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are equivalent (although to_dense actually does a bit less as it specified the dtype and asarray does some inference (not sure for that difference though)).

@jorisvandenbossche
Copy link
Member Author

cc @TomAugspurger since you are most familiar with Sparse nowadays .. (although reluctantly :-))

Removing this here also further entangles a bit the get_values / values mess, as SparseArray is still the only array with .values, and in some places we do hasattr or getattr on 'values', which then catches SparseArray ..

@TomAugspurger
Copy link
Contributor

+1

Looks like a few warnings still https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=11542&view=logs&jobId=521b7dfd-2989-5ff8-bc8c-7481906480fa&taskId=07b8d9d4-6363-5e2d-bc2b-146a30521256&lineStart=154&lineEnd=154&colStart=109&colEnd=115

My other PR is adding

filterwarnings =
    error:Sparse:FutureWarning

to our setup.cfg. If you make the error message something like SparseArray.values, these warnings would be elevated to errors too (not sure if we want that or not).

@jorisvandenbossche
Copy link
Member Author

Ah, I missed the apply ones.
It's quite annoying that the output on our CI does not show which tests is causing it ... (due to using xdist).

There is one (that I actually already knew about, but for now ignored) that is not that easy to solve: the json code (ujson/python/objToJSON.c) checks in C for a 'values' attribute to get the values out of dataframe / series / index etc.

@jorisvandenbossche
Copy link
Member Author

@TomAugspurger @jreback can you have a new look? I added some extra compat code in cython/c code

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
return np.empty(size, dtype='O')


cdef bint _is_sparse_array(object obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better-suited for pandas._libs.util? Or keep here since this is the only file using it and it's temporary?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly for those reasons (It's only used here, and should be removed again once we get rid of this deprecation), I would keep it here (it's not mean to be a general utility)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right location not should be in util
your argument is not correct ; just because we eventually will remove it does not mean it should. it be with similar code

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 20, 2019 via email

@jorisvandenbossche jorisvandenbossche merged commit d3a1912 into pandas-dev:master May 21, 2019
@jorisvandenbossche jorisvandenbossche deleted the depr-sparse-values branch May 21, 2019 06:23
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really sure of the urgency here @jorisvandenbossche

i have some comments - and will fully review at some point

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
return np.empty(size, dtype='O')


cdef bint _is_sparse_array(object obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right location not should be in util
your argument is not correct ; just because we eventually will remove it does not mean it should. it be with similar code

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
return np.empty(size, dtype='O')


cdef bint _is_sparse_array(object obj):
# TODO can be removed one SparseArray.values is removed (GH26421)
if hasattr(obj, '_subtyp'):
Copy link
Contributor

@jreback jreback May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this idiom should be getattr

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented May 21, 2019

Sorry, there was no urgency at all. Just thought for a moment that the review of Tom was enough, and wanted to get over with this PR. Will wait on your full review then before doing any fixup.

@jreback
Copy link
Contributor

jreback commented May 26, 2019

@jorisvandenbossche my main comment was the Is_sparse_array needs to be in util.pyx (doesn't matter that we will eventually remove it), its in the wrong place.

slight confusion between whether we recommend .to_dense() or np.array() for conversions; we should try to be consistent (maybe just deprecate .to_dense()) but another issue (maybe create one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants